Hands-on Exercise 10.1: Process and Visualize Flow Data

Author

Nguyen Bao Thu Phuong

Published

October 27, 2024

Modified

October 26, 2024

1 Overview

Spatial interaction represents the movement of people, goods, or information between locations in geographical space, covering everything from freight shipments and flight schedules to rush hour traffic and pedestrian flow.

Each spatial interaction is composed of an origin/destination pair, represented as a cell in a matrix where rows correspond to origin locations (centroids) and columns to destination locations (centroids). This structure is known as an origin/destination (OD) matrix, or spatial interaction matrix.

In this hands-on exercise, we explore how to build an OD matrix using the “Passenger Volume by Origin Destination Bus Stops” dataset from the LTA DataMall. By the end, you will be able to:

  • Import and extract OD data for a specific time interval,

  • Import and save geospatial data (e.g., bus stops and mpsz) into sf tibble data frames,

  • Add planning subzone codes to the bus stops sf tibble,

  • Create desire line geospatial data from the OD data, and

  • Visualize passenger volume between origin and destination bus stops using the desire line data.

2 Getting Started

For this exercise, five R packages will be used:

  • sf: For importing, integrating, processing, and transforming geospatial data.

  • tidyverse: For importing, integrating, wrangling, and visualizing data.

  • tmap: For creating elegant, cartographic-quality thematic maps.

  • stplanr: Offers functions for common transport planning tasks, such as downloading and cleaning transport datasets, creating geographic “desire lines” from origin-destination (OD) data, route assignment (both locally and through services like CycleStreets.net), calculating route segment attributes (e.g., bearing, flow), and conducting “travel watershed” analysis.

  • DT: Provides an R interface to the JavaScript DataTables library, allowing R data objects (matrices or data frames) to be displayed as interactive HTML tables with filtering, pagination, sorting, and more features.

pacman::p_load(tmap, sf, DT, stplanr, tidyverse)

3 Prepare the Flow data

3.1 Import the OD data

First we import the Passenger Volume by Origin Destination Bus Stops data set downloaded from LTA DataMall using read_csv() of readr package.

odbus <- read_csv("data/aspatial/origin_destination_bus_202409.csv")
Rows: 5721503 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): YEAR_MONTH, DAY_TYPE, PT_TYPE, ORIGIN_PT_CODE, DESTINATION_PT_CODE
dbl (2): TIME_PER_HOUR, TOTAL_TRIPS

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

We display the odbus tibble data table by using the code chunk below.

glimpse(odbus)
Rows: 5,721,503
Columns: 7
$ YEAR_MONTH          <chr> "2024-09", "2024-09", "2024-09", "2024-09", "2024-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/…
$ TIME_PER_HOUR       <dbl> 20, 19, 23, 14, 14, 18, 9, 15, 7, 21, 7, 7, 8, 10,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "46009", "59091", "54181", "61039", "44629", "7720…
$ DESTINATION_PT_CODE <chr> "46249", "59419", "54201", "61111", "44009", "6441…
$ TOTAL_TRIPS         <dbl> 586, 54, 72, 256, 902, 6, 5, 15, 6, 4, 76, 36, 3, …

A quick check of the odbus tibble data frame reveals that values in the ORIGIN_PT_CODE and DESTINATION_PT_CODE columns are in numeric format. The code chunk below converts these values to character data type.

odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

3.2 Extract the study data

For the purpose of this exercise, we extract commuting flows on weekday and between 6 and 9 o’clock.

odbus6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
`summarise()` has grouped output by 'ORIGIN_PT_CODE'. You can override using
the `.groups` argument.

Table below shows the content of odbus6_9.

datatable(odbus6_9)
Warning in instance$preRenderHook(instance): It seems your data is too big for
client-side DataTables. You may consider server-side processing:
https://rstudio.github.io/DT/server.html

We save the output in rds format for future used.

write_rds(odbus6_9, "data/rds/odbus6_9.rds")

The code chunk below import the save odbus6_9.rds into R environment.

odbus6_9 <- read_rds("data/rds/odbus6_9.rds")

4 Working with Geospatial Data

For this exercise, two geospatial datasets will be used:

  • BusStop: Contains the locations of bus stops as of the last quarter of 2022.

  • MPSZ-2019: Provides the sub-zone boundaries from the URA Master Plan 2019.

Both datasets are in ESRI shapefile format.

4.1 Import Geospatial Data

We import the 2 datasets into R environment using below code chunks.

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\thuphuong1110\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5166 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48285.52 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\thuphuong1110\ISSS626-GAA\Hands-on_Ex\Hands-on_Ex10\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...

Note:

  • st_read() function of sf package is used to import the shapefile into R as sf data frame.

  • st_transform() function of sf package is used to transform the projection to crs 3414.

The code chunk below write mpsz sf tibble data frame into an rds file for future use.

mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")

5 Geospatial data wrangling

5.1 Combine Busstop and mpsz

The code chunk below populates the planning subzone code (SUBZONE_C) from the mpsz sf data frame into the busstop sf data frame. The st_intersection() function is used to perform a point-in-polygon overlay, resulting in a point sf object.

Next, select() from the dplyr package is used to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz sf data frame. Note that five bus stops are excluded in the result data frame because they fall outside Singapore’s boundary.

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
Warning: attribute variables are assumed to be spatially constant throughout
all geometries
datatable(busstop_mpsz)

Before moving to the next step, we save the output into rds format.

write_rds(busstop_mpsz, "data/rds/busstop_mpsz.rds")  

Next, we append the planning subzone code from busstop_mpsz data frame onto odbus6_9 data frame.

od_data <- left_join(odbus6_9 , busstop_mpsz,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_SZ = SUBZONE_C,
         DESTIN_BS = DESTINATION_PT_CODE)
Warning in left_join(odbus6_9, busstop_mpsz, by = c(ORIGIN_PT_CODE = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 16245 of `x` matches multiple rows in `y`.
ℹ Row 674 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Before continue, it is a good practice to check for duplicating records.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 1,452 × 4
   ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ
   <chr>     <fct>     <dbl> <chr>    
 1 09047     01611         3 ORSZ02   
 2 09047     01611         3 ORSZ02   
 3 09047     02049        52 ORSZ02   
 4 09047     02049        52 ORSZ02   
 5 09047     02089        32 ORSZ02   
 6 09047     02089        32 ORSZ02   
 7 09047     02151        87 ORSZ02   
 8 09047     02151        87 ORSZ02   
 9 09047     02161        24 ORSZ02   
10 09047     02161        24 ORSZ02   
# ℹ 1,442 more rows

The code chunk below retain the unique records if duplication is found.

od_data <- unique(od_data)

We check again confirm if the duplicating records issue has been addressed fully.

duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
duplicate
# A tibble: 0 × 4
# ℹ 4 variables: ORIGIN_BS <chr>, DESTIN_BS <fct>, TRIPS <dbl>, ORIGIN_SZ <chr>

Next, we update od_data data frame with the planning subzone codes.

od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 
Warning in left_join(od_data, busstop_mpsz, by = c(DESTIN_BS = "BUS_STOP_N")): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 162 of `x` matches multiple rows in `y`.
ℹ Row 673 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
od_data <- unique(od_data)
od_data <- od_data %>%
  rename(DESTIN_SZ = SUBZONE_C) %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))
`summarise()` has grouped output by 'ORIGIN_SZ'. You can override using the
`.groups` argument.

It is time to save the output into an rds file format.

write_rds(od_data, "data/rds/od_data_fii.rds")
od_data_fii <- read_rds("data/rds/od_data_fii.rds")

6 Visualize Spatial Interaction

In this section, we explore how to prepare a desire line using stplanr package.

6.1 Remove intra-zonal flows

We will not plot the intra-zonal flows. The code chunk below is used to remove intra-zonal flows.

od_data_fij <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]
write_rds(od_data_fij, "data/rds/od_data_fij.rds")
od_data_fij <- read_rds("data/rds/od_data_fij.rds")

6.2 Create desire lines

The code chunk below use od2line() of stplanr package to create the desire lines.

flowLine <- od2line(flow = od_data_fij, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")
Creating centroids representing desire line start and end points.
write_rds(flowLine, "data/rds/flowLine.rds")
flowLine <- read_rds("data/rds/flowLine.rds")

6.3 Visualize the desire lines

The code chunk below is used to visualise the resulting desire lines.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length

Warning: Be patient as the rendering process takes more time because of the transparency argument (alpha)

When flow data are messy and highly skewed, as seen above, it’s more effective to focus on selected flows—such as those greater than or equal to 5000, as shown below.

tm_shape(mpsz) +
  tm_polygons() +
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
Warning in g$scale * (w_legend/maxW): longer object length is not a multiple of
shorter object length
Warning in g$scale * (x/maxW): longer object length is not a multiple of
shorter object length